12 research outputs found

    BNP-Seq: Bayesian Nonparametric Differential Expression Analysis of Sequencing Count Data

    Full text link
    We perform differential expression analysis of high-throughput sequencing count data under a Bayesian nonparametric framework, removing sophisticated ad-hoc pre-processing steps commonly required in existing algorithms. We propose to use the gamma (beta) negative binomial process, which takes into account different sequencing depths using sample-specific negative binomial probability (dispersion) parameters, to detect differentially expressed genes by comparing the posterior distributions of gene-specific negative binomial dispersion (probability) parameters. These model parameters are inferred by borrowing statistical strength across both the genes and samples. Extensive experiments on both simulated and real-world RNA sequencing count data show that the proposed differential expression analysis algorithms clearly outperform previously proposed ones in terms of the areas under both the receiver operating characteristic and precision-recall curves.Comment: To appear in Journal of the American Statistical Associatio

    Bayesian Analysis of High-Throughput Sequencing Data

    No full text
    We develop a Bayesian framework for the analysis of high-throughput sequencing count data under a variety of settings, removing sophisticated ad-hoc pre-processing steps commonly required in existing algorithms. Specifically, we start by exploiting Bayesian nonparametric priors, including the gamma-Poisson, gamma-negative binomial, and beta-negative binomial processes, to model RNA sequencing (RNA-seq) count matrices. We then develop a novel Bayesian negative binomial regression (BNB-R) method for the analysis of RNA-seq count data. In particular, the natural model parameterization removes the needs for the normalization step, while the method is capable of tackling complex experimental design involving multivariate dependence structures. In addition to studying genes individually, investigating coordinated expression variations of genes may help reveal the underlying cellular mechanisms to derive better understanding and more effective prognosis and intervention strategies. In chapter 4, We develop a fully Bayesian covariate-dependent negative binomial factor analysis method—dNBFA—for RNA-seq count data, to capture coordinated gene expression changes, while considering effects from covariates reflecting different influencing factors. Finally, in the last chapter, we propose a fully generative hierarchical gamma-negative binomial (hGNB) model of single-cell RNA-seq (scRNA-seq) data, obviating the need for explicitly modeling zero inflation. hGNB can naturally account for covariate effects at both the gene and cell levels to identify complex latent representations of scRNA-seq data, without the need for commonly adopted pre-processing steps such as normalization

    Bayesian Analysis of High-Throughput Sequencing Data

    No full text
    We develop a Bayesian framework for the analysis of high-throughput sequencing count data under a variety of settings, removing sophisticated ad-hoc pre-processing steps commonly required in existing algorithms. Specifically, we start by exploiting Bayesian nonparametric priors, including the gamma-Poisson, gamma-negative binomial, and beta-negative binomial processes, to model RNA sequencing (RNA-seq) count matrices. We then develop a novel Bayesian negative binomial regression (BNB-R) method for the analysis of RNA-seq count data. In particular, the natural model parameterization removes the needs for the normalization step, while the method is capable of tackling complex experimental design involving multivariate dependence structures. In addition to studying genes individually, investigating coordinated expression variations of genes may help reveal the underlying cellular mechanisms to derive better understanding and more effective prognosis and intervention strategies. In chapter 4, We develop a fully Bayesian covariate-dependent negative binomial factor analysis method—dNBFA—for RNA-seq count data, to capture coordinated gene expression changes, while considering effects from covariates reflecting different influencing factors. Finally, in the last chapter, we propose a fully generative hierarchical gamma-negative binomial (hGNB) model of single-cell RNA-seq (scRNA-seq) data, obviating the need for explicitly modeling zero inflation. hGNB can naturally account for covariate effects at both the gene and cell levels to identify complex latent representations of scRNA-seq data, without the need for commonly adopted pre-processing steps such as normalization
    corecore